Xinzhu Wang(xw2581)

What makes people happy? It is always a topic deserving exploration and analysis. In the following content, I will discuss the differences of happy moments between male and female, as well as the happy moments for past 24 hours versus 3 months. Also nine different topics of happy moments for each gender will be included. Hope these analysis will provide people a better undersatnd of happiness and how happy moments spark in everyone’s daily life.

I. Import and Clean data

First, I import the original dataset called hm_orig generated by the test-processing.rmd file. Then, the hm dataset is the one I have cleaned the stopwords which is more readable and more convinent to do analysis. Here it shows 6 rows of the data in hm dataset.

hm_orig <- read.csv("~/Documents/GitHub/Spring2019-Proj1-EvelynWangxz/output/processed_moments.csv")
hm <- read.csv("~/Documents/GitHub/Spring2019-Proj1-EvelynWangxz/output/processed_moments_new.csv")
hm_demo <- read.csv("~/Documents/GitHub/Spring2019-Proj1-EvelynWangxz/data/data/demographic.csv")
head(hm,6)
hm_m <- filter(hm,hm$wid %in% filter(hm_demo,gender == "m")$wid)
hm_f <- filter(hm,hm$wid %in% filter(hm_demo,gender == "f")$wid)

Below are some sample code used to clean stopwords. Actually it has been finished in Text_Processing_new.rmd file. I made some deletion in stopwords according to the results of counting the frequencies of each word, like day, time, finally which are uncessary and meaningless to apeear in the following analysis.

#data("stop_words")
#word <- c("happy","ago","yesterday","lot","today","months","month",
#                 "happier","happiest","last","week","past","day","time","finally","feel","enjoyed","moment","nice","favorite","hours","weekend","called","days","enjoy","excited","didnt")
#stop_words <- stop_words %>%
#  bind_rows(mutate(tibble(word), lexicon = "updated"))

II. Explorative Data Analysis

  1. WordCloud for data before deleteing more stop words. In this wordcloud, you can observe that the most common word is “frind”, along with some words like “day”, “time”, “family” and “home”. However, it is very obvious that words like “day” and “time” are meaningless. They will affect the analysis results and cover the real meaningful words. So I coustomize the stopwords list and delete them.
corpus_orig <- VCorpus(VectorSource(hm_orig$text))
tm_orig <- TermDocumentMatrix(corpus_orig)
tm.tidy_orig <- tidy(tm_orig)
tm1_orig <- summarise(group_by(tm.tidy_orig, term), sum(count))
wordcloud(tm1_orig$term, tm1_orig$`sum(count)`, scale=c(2,0.5), max.words=200, min.freq=1, random.order=FALSE, rot.per=0.3, use.r.layout=T, random.color=T, colors=c("purple","pink","lightblue"))

  1. After deleting the custom stopwords, I finalize a brand-new wordcloud. It shows the frequency of each word when you click on the word. In the wordcloud, “friend” is still the most frequent word. There are also some words like “home”, “family”, “son”, “watched”, “game” and “birthday” which illustrate the topic of these happy moments might be related to family and enjoyment activities.
corpus <- VCorpus(VectorSource(hm$text))
tm <- TermDocumentMatrix(corpus)
tm.tidy <- tidy(tm)
tm1 <- summarise(group_by(tm.tidy, term), sum(count))
tm1 <- data.frame(tm1)
tm.red <- head(tm1[order(tm1[,2],decreasing = T),],100)
wordcloud2(tm.red, size=0.7,color='random-light',fontWeight = 'bold',minRotation = -pi/3,maxRotation = pi/3,rotateRatio = 0.8)
  1. After having an overall analysis of the most frequent words in happy moments. I would like to focus on the differences between male and female. Two bar plots below is a comparsion of the top 10 words for male and female. For both male and female, “friend” is the most reliable source of happiness. While, the other words in the top 10 for male show that male are catching happiness more from games, watchting and playing activities which is very different from female’s. For female, they are more likely to get happiness from family like their hubsand, son and daughter. It is barely to see enjoyment words like “games” and “played” in female’s top 10 lists. This observation is shown more clearly in the third barplot which is a directly comparsion of top 20 words between male and female.
m.tidy <- tidy(TermDocumentMatrix(VCorpus(VectorSource(hm_m$text))))
m <- summarise(group_by(m.tidy, term), sum(count))
m <- data.frame(m)
m.red <- head(m[order(m[,2],decreasing = T),],10)
p1 <- ggplot(m.red, aes(x =reorder(m.red$term,m.red$sum.count.),y = m.red$sum.count.)) + geom_bar(stat = "identity", fill = "lightblue") + coord_flip() + xlab("male") + ylab("term")
f.tidy <- tidy(TermDocumentMatrix(VCorpus(VectorSource(hm_f$text))))
f <- summarise(group_by(f.tidy, term), sum(count))
f <- data.frame(f)
f.red <- head(f[order(f[,2],decreasing = T),],10)
p2 <- ggplot(f.red, aes(x =reorder(term,sum.count.),y = sum.count.)) + geom_bar(stat = "identity", fill = "pink") + coord_flip() + xlab("female") + ylab("term")
grid.arrange(p1, p2, nrow = 1, ncol = 2)

For most top 20 words, male and female have similar frequency except the word “friend” and some other words that doesn’t appear in one side. It demonstrates that female pays more attention to family than male who is more care about enjoyments. This observation is very interesting and intuitive. It is more likely to be the nature of gender which puts female and family more tight overally. In addition to this, friend is a very important character in everyone’s life and spending time with friends is the main happiness source.

m.red <- head(m[order(m[,2],decreasing = T),],20)
f.red <- head(f[order(f[,2],decreasing = T),],20)
m.red$sum.count. <- m.red$sum.count.*(-1)
temp1 <- rbind(m.red,f.red)
temp2 <- c(rep("m",20),rep("f",20))
temp1$gender <- temp2 
ggplot(data = temp1, aes(x = reorder(temp1$term, -temp1$sum.count.), y = temp1$sum.count., group = gender, fill = gender)) + geom_bar(position = "stack", stat = "identity") + scale_y_continuous(labels = c(5000,4000,3000,2000,1000,0,1000,2000,3000,4000,5000), breaks = seq(-5000,5000,1000)) + xlab("term") + ylab("frequency") + coord_flip() 

III. LDA Analysis

  1. Using LDA to show the top 10 frequent words of nine different topics of happy moments between male and female. After analyzing the top 10/20 frequent words for both male and female, I would like to show you what kind of topic these words could build. It will illustrate the buildup of happy moments for each gender. For male, accroding to the plot below, I would like to divide the words in nine topics which are “family”, “event”, “entertainment”, “life”, “food”, “shopping”, “school”, “archievement”, “work”. This result is based on the top 10 frequent words in each topic. For example, in one topic, there are words including “game”, “played”, “watched” and “video” which indicate this topic should be entertainment. Doing the same thing to the left will get their topics. These nine topics are pretty reasonable and basically overlap people’s daily lives.
m.data <- hm_m %>% group_by(wid) %>% summarise(text = paste(text, collapse = " "))
m.lda <- LDA(DocumentTermMatrix(Corpus(VectorSource(m.data$text))), k = 9, control = list(seed = 2000))
m.lda.tidy <- tidy(m.lda)
m.top10 <- m.lda.tidy %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% arrange(topic, -beta)
plot_m <- m.top10 %>%mutate(term = reorder(term, beta)) %>% ggplot(aes(term, beta, fill = factor(topic))) +geom_bar(stat = "identity", show.legend = T) +facet_wrap(~ topic, scales = "free") +coord_flip() + ggtitle("Male:Top 10 Words in Nine Topics")
plot_m

For female, accroding to the plot below, I would like to divide the words in nine topics which are “pet”, “shopping”, “family”, “people”, “event”, “entertainment”, “work”, “life”, “people”. Comparing nine topics for male and female, an interesting phenomenon happened. There is a topic called “pet” in female’s data instead of “shopping”. It could show that female is more likely to feel happy from pet in somehow than male. What else, there is a topic “shopping” in both gender. As we all know, female is more carzy about shopping. This conclusion might not be true. Most shopping female do might be around family.

f.data <- hm_f %>% group_by(wid) %>% summarise(text = paste(text, collapse = " "))
f.lda <- LDA(DocumentTermMatrix(Corpus(VectorSource(f.data$text))), k = 9, control = list(seed = 2000))
f.lda.tidy <- tidy(f.lda)
f.top10 <- f.lda.tidy %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% arrange(topic, -beta)
plot_f <- f.top10 %>%mutate(term = reorder(term, beta)) %>% ggplot(aes(term, beta, fill = factor(topic))) +geom_bar(stat = "identity", show.legend = T) +facet_wrap(~ topic, scales = "free") +coord_flip() + ggtitle("Female:Top 10 Words in Nine Topic")
plot_f

IV. Time Effect Analysis

Time is always an effective influence to the content of happy moments. Therefore, I would like to analyze what is the shift between past 24 hours and 3 months in each gender.

  1. Differences of happy moments between male and female for the past 24 hours
hm.m24 <- hm_m[hm_m$reflection_period=="24h",]
hm.f24 <- hm_f[hm_f$reflection_period=="24h",]
hm.m24$gender <- "m"
hm.f24$gender <- "f"
hm24 <- rbind(hm.m24,hm.f24)
hm24_plot <- ggplot(hm24, aes(predicted_category, fill = gender)) + geom_bar() + ggtitle("Predicted Category Distribution for 24 hours") +coord_flip() 
  1. Differences of happy moments between male and female for the past 3 months
hm.m3 <- hm_m[hm_m$reflection_period=="3m",]
hm.f3 <- hm_m[hm_f$reflection_period=="3m",]
hm.m3$gender <- "m"
hm.f3$gender <- "f"
hm3 <- rbind(hm.m3,hm.f3)
hm3_plot <- ggplot(hm3, aes(predicted_category, fill = gender)) + geom_bar() + ggtitle("Predicted Category Distribution for 3 months") +coord_flip() 

For the plot below, it clearly show the differnces for past 24 hours and 3 months. If we ignore the gender issue, we can see that for past 24 hours, affection is the most frequent category in happy moments. The second one is archievement much more than other categories. However, for past 3 months, the most frequent category is archievement and the second one is affection. Actually, for past three months, the total number of affection didn’t increase a lot, but the total of archievement increased rapidly. Also the total number of enjoyment didn’t increase at all. This phenomenon is very special and indicative. It leads me to think of the ture happiness of life. The affection is an emotional release which can not last a long time and commonly be remembered just for a short time. Differently, archievement is enduring no matter for male and female. With the past of time, archievement will be eulogized for a long time accompanyed with more sustainable happiness. In each category, the ratio of male and female is pretty similar except for the enjoyment for past 24 hours. It is true that male will get more happiness from enjoyments like playing video games. However, with the increasing of time, the ratio of enjoyment becomes half and half. Male becomes to pay less attention to enjoyments and focus more on archievement.

grid.arrange(hm24_plot, hm3_plot, nrow = 2, ncol = 1)

  1. HeatMap The heatmap below is a more direct way to compare time effect on each category between male and female. The grid with deeper color indicates the more important position of that category in happy moments. The affection and archievement are very outstanding compared with others. The exercise is very cheerless and only has a little proportion for male in past 24 hours. It can slightly illustrate that male cares more about exercise than female does.
hm.m24$time <- "m24"
hm.f24$time <- "f24"
hm.m3$time <- "m3"
hm.f3$time <- "f3"
hm.time <- rbind(hm.m24,hm.f24,hm.m3,hm.f3)
prop <- function(data){
affection <- data[data$predicted_category=="affection",]
a <- dim(affection)[1]/nrow(data)
leisure <- data[data$predicted_category=="leisure",]
l <- dim(leisure)[1]/nrow(data)
enjoy_the_moment <- data[data$predicted_category=="enjoy_the_moment",]
e <- dim(enjoy_the_moment)[1]/nrow(data)
achievement <- data[data$predicted_category=="achievement",]
ach <- dim(achievement)[1]/nrow(data)
bonding <- data[data$predicted_category=="bonding",]
b <- dim(bonding)[1]/nrow(data)
nature <- data[data$predicted_category=="nature",]
n <- dim(nature)[1]/nrow(data)
exercise <- data[data$predicted_category=="exercise",]
ex <- dim(exercise)[1]/nrow(data)
df <- data.frame(c(a,l,e,ach,b,n,ex))
}
d1 <- prop(hm.f24)
d2 <- prop(hm.m24)
d3 <- prop(hm.f3)
d4 <- prop(hm.m3)
d11 <- rbind(d1,d2,d3,d4)
d22 <- c(rep("f24",7),rep("m24",7),rep("f3",7),rep("m3",7))
d33 <- rep(c("affection","leisure","enjoy_the_moment","achievement","bonding","nature","exercise"),4)
df <- data.frame(d11,d22,d33)
colnames(df) <- c("prop","time","topic")
ggplot(df, aes(x=time,y=topic,fill=prop)) + geom_tile() + scale_fill_gradient2('legend name', low = 'blue', high = 'red') 

V. Conclusion

All in all, based on the analysis above, we have an overall understanding of the differences of happy moments between male and female as well as under different time flows. Bascially, the top 10 frequent words for the two gender groups are very similar. However, male might be more focus on entertainments while female pays more attention on family. Then, for the nine topics generated for male and female. There is a topic called “pet” for female instead of “shopping” for male. This indicates that female might be more interested in pets than male. Finally, the most frequent topic(category) could change as time goes by. For a short time, affection is very influential for both male and female. But archievement catched up quickly with the increase of time. After these analysis, hope people can have a more comprehensive and deeper understand of happy moments in life.

---
title: "Project 1 - Happy Moment Analysis"
output: html_notebook
---

```{r, message=F,echo=F}
library(tm)
library(wordcloud)
library(wordcloud2)
library(dplyr)
library(tidytext)
library(ggplot2)
library(forcats)
library(topicmodels)
library(gridExtra)
library(tidyverse)
library(tidyr)
library(ggridges)
```
#Xinzhu Wang(xw2581)
What makes people happy? It is always a topic deserving exploration and analysis. In the following content, I will discuss the differences of happy moments between male and female, as well as the happy moments for past 24 hours versus 3 months. Also nine different topics of happy moments for each gender will be included. Hope these analysis will provide people a better undersatnd of happiness and how happy moments spark in everyone's daily life. 

# I. Import and Clean data 

First, I import the original dataset called hm_orig generated by the test-processing.rmd file. Then, the hm dataset is the one I have cleaned the stopwords which is more readable and more convinent to do analysis. Here it shows 6 rows of the data in hm dataset.  

```{r,warning=F}
hm_orig <- read.csv("~/Documents/GitHub/Spring2019-Proj1-EvelynWangxz/output/processed_moments.csv")
hm <- read.csv("~/Documents/GitHub/Spring2019-Proj1-EvelynWangxz/output/processed_moments_new.csv")
hm_demo <- read.csv("~/Documents/GitHub/Spring2019-Proj1-EvelynWangxz/data/data/demographic.csv")
```

```{r}
head(hm,6)
```

```{r}
hm_m <- filter(hm,hm$wid %in% filter(hm_demo,gender == "m")$wid)
hm_f <- filter(hm,hm$wid %in% filter(hm_demo,gender == "f")$wid)
```

Below are some sample code used to clean stopwords. Actually it has been finished in Text_Processing_new.rmd file. I made some deletion in stopwords according to the results of counting the frequencies of each word, like day, time, finally which are uncessary and meaningless to apeear in the following analysis. 

```{r stopwords}
#data("stop_words")

#word <- c("happy","ago","yesterday","lot","today","months","month",
#                 "happier","happiest","last","week","past","day","time","finally","feel","enjoyed","moment","nice","favorite","hours","weekend","called","days","enjoy","excited","didnt")

#stop_words <- stop_words %>%
#  bind_rows(mutate(tibble(word), lexicon = "updated"))
```

# II. Explorative Data Analysis

1. WordCloud for data before deleteing more stop words.
In this wordcloud, you can observe that the most common word is "frind", along with some words like "day", "time", "family" and "home". However, it is very obvious that words like "day" and "time" are meaningless. They will affect the analysis results and cover the real meaningful words. So I coustomize the stopwords list and delete them. 
```{r}
corpus_orig <- VCorpus(VectorSource(hm_orig$text))
tm_orig <- TermDocumentMatrix(corpus_orig)
tm.tidy_orig <- tidy(tm_orig)
tm1_orig <- summarise(group_by(tm.tidy_orig, term), sum(count))
```

```{r}
wordcloud(tm1_orig$term, tm1_orig$`sum(count)`, scale=c(2,0.5), max.words=200, min.freq=1, random.order=FALSE, rot.per=0.3, use.r.layout=T, random.color=T, colors=c("purple","pink","lightblue"))
```

2. After deleting the custom stopwords, I finalize a brand-new wordcloud. It shows the frequency of each word when you click on the word. In the wordcloud, "friend" is still the most frequent word. There are also some words like "home", "family", "son", "watched", "game" and "birthday" which illustrate the topic of these happy moments might be related to family and enjoyment activities.
```{r}
corpus <- VCorpus(VectorSource(hm$text))
tm <- TermDocumentMatrix(corpus)
tm.tidy <- tidy(tm)
tm1 <- summarise(group_by(tm.tidy, term), sum(count))
```

```{r}
tm1 <- data.frame(tm1)
tm.red <- head(tm1[order(tm1[,2],decreasing = T),],100)
```

```{r}
wordcloud2(tm.red, size=0.7,color='random-light',fontWeight = 'bold',minRotation = -pi/3,maxRotation = pi/3,rotateRatio = 0.8)
```


3. After having an overall analysis of the most frequent words in happy moments. I would like to focus on the differences between male and female. Two bar plots below is a comparsion of the top 10 words for male and female. For both male and female, "friend" is the most reliable source of happiness. While, the other words in the top 10 for male show that male are catching happiness more from games, watchting and playing activities which is very different from female's. For female, they are more likely to get happiness from family like their hubsand, son and daughter. It is barely to see enjoyment words like "games" and "played" in female's top 10 lists. This observation is shown more clearly in the third barplot which is a directly comparsion of top 20 words between male and female. 

```{r}
m.tidy <- tidy(TermDocumentMatrix(VCorpus(VectorSource(hm_m$text))))
m <- summarise(group_by(m.tidy, term), sum(count))
m <- data.frame(m)
m.red <- head(m[order(m[,2],decreasing = T),],10)
```

```{r}
p1 <- ggplot(m.red, aes(x =reorder(m.red$term,m.red$sum.count.),y = m.red$sum.count.)) + geom_bar(stat = "identity", fill = "lightblue") + coord_flip() + xlab("male") + ylab("term")
```

```{r}
f.tidy <- tidy(TermDocumentMatrix(VCorpus(VectorSource(hm_f$text))))
f <- summarise(group_by(f.tidy, term), sum(count))
f <- data.frame(f)
f.red <- head(f[order(f[,2],decreasing = T),],10)
```

```{r}
p2 <- ggplot(f.red, aes(x =reorder(term,sum.count.),y = sum.count.)) + geom_bar(stat = "identity", fill = "pink") + coord_flip() + xlab("female") + ylab("term")
```

```{r}
grid.arrange(p1, p2, nrow = 1, ncol = 2)
```

For most top 20 words, male and female have similar frequency except the word "friend" and some other words that doesn't appear in one side. It demonstrates that female pays more attention to family than male who is more care about enjoyments. This observation is very interesting and intuitive. It is more likely to be the nature of gender which puts female and family more tight overally. In addition to this, friend is a very important character in everyone's life and spending time with friends is the main happiness source.  

```{r}
m.red <- head(m[order(m[,2],decreasing = T),],20)
f.red <- head(f[order(f[,2],decreasing = T),],20)
m.red$sum.count. <- m.red$sum.count.*(-1)
temp1 <- rbind(m.red,f.red)
temp2 <- c(rep("m",20),rep("f",20))
temp1$gender <- temp2 
ggplot(data = temp1, aes(x = reorder(temp1$term, -temp1$sum.count.), y = temp1$sum.count., group = gender, fill = gender)) + geom_bar(position = "stack", stat = "identity") + scale_y_continuous(labels = c(5000,4000,3000,2000,1000,0,1000,2000,3000,4000,5000), breaks = seq(-5000,5000,1000)) + xlab("term") + ylab("frequency") + coord_flip() 
```

# III. LDA Analysis 
1. Using LDA to show the top 10 frequent words of nine different topics of happy moments between male and female.
After analyzing the top 10/20 frequent words for both male and female, I would like to show you what kind of topic these words could build. It will illustrate the buildup of happy moments for each gender. 
For male, accroding to the plot below, I would like to divide the words in nine topics which are "family", "event", "entertainment", "life", "food", "shopping", "school", "archievement", "work". This result is based on the top 10 frequent words in each topic. For example, in one topic, there are words including "game", "played", "watched" and "video" which indicate this topic should be entertainment. Doing the same thing to the left will get their topics. These nine topics are pretty reasonable and basically overlap people's daily lives. 

```{r}
m.data <- hm_m %>% group_by(wid) %>% summarise(text = paste(text, collapse = " "))
m.lda <- LDA(DocumentTermMatrix(Corpus(VectorSource(m.data$text))), k = 9, control = list(seed = 2000))
m.lda.tidy <- tidy(m.lda)
m.top10 <- m.lda.tidy %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% arrange(topic, -beta)
```

```{r}
plot_m <- m.top10 %>%mutate(term = reorder(term, beta)) %>% ggplot(aes(term, beta, fill = factor(topic))) +geom_bar(stat = "identity", show.legend = T) +facet_wrap(~ topic, scales = "free") +coord_flip() + ggtitle("Male:Top 10 Words in Nine Topics")
plot_m
```

For female, accroding to the plot below, I would like to divide the words in nine topics which are "pet", "shopping", "family", "people", "event", "entertainment", "work", "life", "people". Comparing nine topics for male and female, an interesting phenomenon happened. There is a topic called "pet" in female's data instead of "shopping". It could show that female is more likely to feel happy from pet in somehow than male. What else, there is a topic "shopping" in both gender. As we all know, female is more carzy about shopping. This conclusion might not be true. Most shopping female do might be around family.   

```{r}
f.data <- hm_f %>% group_by(wid) %>% summarise(text = paste(text, collapse = " "))
f.lda <- LDA(DocumentTermMatrix(Corpus(VectorSource(f.data$text))), k = 9, control = list(seed = 2000))
f.lda.tidy <- tidy(f.lda)
f.top10 <- f.lda.tidy %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% arrange(topic, -beta)
```

```{r}
plot_f <- f.top10 %>%mutate(term = reorder(term, beta)) %>% ggplot(aes(term, beta, fill = factor(topic))) +geom_bar(stat = "identity", show.legend = T) +facet_wrap(~ topic, scales = "free") +coord_flip() + ggtitle("Female:Top 10 Words in Nine Topic")
plot_f
```

# IV. Time Effect Analysis
Time is always an effective influence to the content of happy moments. Therefore, I would like to analyze what is the shift between past 24 hours and 3 months in each gender.  

1. Differences of happy moments between male and female for the past 24 hours

```{r}
hm.m24 <- hm_m[hm_m$reflection_period=="24h",]
hm.f24 <- hm_f[hm_f$reflection_period=="24h",]
hm.m24$gender <- "m"
hm.f24$gender <- "f"
hm24 <- rbind(hm.m24,hm.f24)
hm24_plot <- ggplot(hm24, aes(predicted_category, fill = gender)) + geom_bar() + ggtitle("Predicted Category Distribution for 24 hours") +coord_flip() 
```


2. Differences of happy moments between male and female for the past 3 months

```{r}
hm.m3 <- hm_m[hm_m$reflection_period=="3m",]
hm.f3 <- hm_m[hm_f$reflection_period=="3m",]
hm.m3$gender <- "m"
hm.f3$gender <- "f"
hm3 <- rbind(hm.m3,hm.f3)
hm3_plot <- ggplot(hm3, aes(predicted_category, fill = gender)) + geom_bar() + ggtitle("Predicted Category Distribution for 3 months") +coord_flip() 
```

For the plot below, it clearly show the differnces for past 24 hours and 3 months. If we ignore the gender issue, we can see that for past 24 hours, affection is the most frequent category in happy moments. The second one is archievement much more than other categories. However, for past 3 months, the most frequent category is archievement and the second one is affection. Actually, for past three months, the total number of affection didn't increase a lot, but the total of archievement increased rapidly. Also the total number of enjoyment didn't increase at all. This phenomenon is very special and indicative. It leads me to think of the ture happiness of life. The affection is an emotional release which can not last a long time and commonly be remembered just for a short time. Differently, archievement is enduring no matter for male and female. With the past of time, archievement will be eulogized for a long time accompanyed with more sustainable happiness.
In each category, the ratio of male and female is pretty similar except for the enjoyment for past 24 hours. It is true that male will get more happiness from enjoyments like playing video games. However, with the increasing of time, the ratio of enjoyment becomes half and half. Male becomes to pay less attention to enjoyments and focus more on archievement. 

```{r}
grid.arrange(hm24_plot, hm3_plot, nrow = 2, ncol = 1)
```

3. HeatMap
The heatmap below is a more direct way to compare time effect on each category between male and female. The grid with deeper color indicates the more important position of that category in happy moments. The affection and archievement are very outstanding compared with others. The exercise is very cheerless and only has a little proportion for male in past 24 hours. It can slightly illustrate that male cares more about exercise than female does.

```{r}
hm.m24$time <- "m24"
hm.f24$time <- "f24"
hm.m3$time <- "m3"
hm.f3$time <- "f3"
hm.time <- rbind(hm.m24,hm.f24,hm.m3,hm.f3)

prop <- function(data){
affection <- data[data$predicted_category=="affection",]
a <- dim(affection)[1]/nrow(data)
leisure <- data[data$predicted_category=="leisure",]
l <- dim(leisure)[1]/nrow(data)
enjoy_the_moment <- data[data$predicted_category=="enjoy_the_moment",]
e <- dim(enjoy_the_moment)[1]/nrow(data)
achievement <- data[data$predicted_category=="achievement",]
ach <- dim(achievement)[1]/nrow(data)
bonding <- data[data$predicted_category=="bonding",]
b <- dim(bonding)[1]/nrow(data)
nature <- data[data$predicted_category=="nature",]
n <- dim(nature)[1]/nrow(data)
exercise <- data[data$predicted_category=="exercise",]
ex <- dim(exercise)[1]/nrow(data)
df <- data.frame(c(a,l,e,ach,b,n,ex))
}
d1 <- prop(hm.f24)
d2 <- prop(hm.m24)
d3 <- prop(hm.f3)
d4 <- prop(hm.m3)
d11 <- rbind(d1,d2,d3,d4)
d22 <- c(rep("f24",7),rep("m24",7),rep("f3",7),rep("m3",7))
d33 <- rep(c("affection","leisure","enjoy_the_moment","achievement","bonding","nature","exercise"),4)
df <- data.frame(d11,d22,d33)
colnames(df) <- c("prop","time","topic")
```

```{r}
ggplot(df, aes(x=time,y=topic,fill=prop)) + geom_tile() + scale_fill_gradient2('legend name', low = 'blue', high = 'red') 
```

# V. Conclusion

All in all, based on the analysis above, we have an overall understanding of the differences of happy moments between male and female as well as under different time flows. Bascially, the top 10 frequent words for the two gender groups are very similar. However, male might be more focus on entertainments while female pays more attention on family. Then, for the nine topics generated for male and female. There is a topic called "pet" for female instead of "shopping" for male. This indicates that female might be more interested in pets than male. Finally, the most frequent topic(category) could change as time goes by. For a short time, affection is very influential for both male and female. But archievement catched up quickly with the increase of time. After these analysis, hope people can have a more comprehensive and deeper understand of happy moments in life.
